AITopics | action selection

Collaborating Authors

action selection

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

2bd235c31c97855b7ef2dc8b414779af-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 06:35:50 GMT

artificial intelligence, lemma 3, relation, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.46)

Add feedback

Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction

Neural Information Processing SystemsApr-25-2026, 06:35:46 GMT

Tree Search (TS) is crucial to some of the most influential successes in reinforcement learning. Here, we tackle two major challenges with TS that limit its usability: distribution shift and scalability. We first discover and analyze a counter-intuitive phenomenon: action selection through TS and a pre-trained value function often leads to lower performance compared to the original pre-trained agent, even when having access to the exact state and reward in future steps. We show this is due to a distribution shift to areas where value estimates are highly inaccurate and analyze this effect using Extreme Value theory. To overcome this problem, we introduce a novel off-policy correction term that accounts for the mismatch between the pre-trained value and its corresponding TS policy by penalizing under-sampled trajectories.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

c1722a7941d61aad6e651a35b65a9c3e-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 00:18:42 GMT

agent, experiment, ick, (14 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

ImproveAgentswithoutRetraining: ParallelTree SearchwithOff-PolicyCorrection

Neural Information Processing SystemsFeb-8-2026, 00:55:03 GMT

Here, we focus ourattention onthesecond case, which leads toscore improvement without anyre-training.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)

Add feedback

Sampling Networks and Aggregate Simulation for Online POMDP Planning

Neural Information Processing SystemsDec-25-2025, 16:58:14 GMT

The paper introduces a new algorithm for planning in partially observable Markov decision processes (POMDP) based on the idea of aggregate simulation. The algorithm uses product distributions to approximate the belief state and shows how to build a representation graph of an approximate action-value function over belief space.

name change, representation, sampling network and aggregate simulation, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction

Neural Information Processing SystemsDec-23-2025, 22:33:42 GMT

Tree Search (TS) is crucial to some of the most influential successes in reinforcement learning. Here, we tackle two major challenges with TS that limit its usability: \textit{distribution shift} and \textit{scalability}. We first discover and analyze a counter-intuitive phenomenon: action selection through TS and a pre-trained value function often leads to lower performance compared to the original pre-trained agent, even when having access to the exact state and reward in future steps. We show this is due to a distribution shift to areas where value estimates are highly inaccurate and analyze this effect using Extreme Value theory. To overcome this problem, we introduce a novel off-policy correction term that accounts for the mismatch between the pre-trained value and its corresponding TS policy by penalizing under-sampled trajectories.

name change, parallel tree search, retraining, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.41)
Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

Active Inference with Reusable State-Dependent Value Profiles

Poschl, Jacob

arXiv.org Machine LearningDec-16-2025

Adaptive behavior in volatile environments requires agents to deploy different value-control regimes across latent contexts, but representing separate preferences, policy biases, and action confidence for every situation is intractable. We introduce value profiles: a small set of reusable bundles of value-related parameters--outcome preferences, policy priors, and policy precision--that are assigned to hidden states in the generative model. As posterior beliefs over states evolve trial-by-trial, effective control parameters emerge through belief-weighted mixing, enabling state-conditional strategy recruitment without maintaining independent parameters for each situation. We evaluate this framework in probabilistic reversal learning, comparing static precision, entropy-coupled dynamic precision, and profile-based models using cross-validated log-likelihood and information criteria. Model comparison using AIC favors the profile-based model over simpler alternatives ( 100-point differences), with consistent parameter recovery demonstrating structural identifiability even when context must be inferred from noisy observations. Model-based inference suggests that, in this task, adaptive control operates primarily through policy prior modulation rather than policy precision modulation, with gradual belief-driven profile recruitment confirming state-conditional rather than merely uncertainty-driven control. Overall, reusable value profiles provide a tractable computational account of belief-conditioned value control in volatile environments, providing a reusable, mode-like representational scheme for behavioral flexibility that yields testable signatures of belief-conditioned control.

adaptation, inference, precision, (17 more...)

arXiv.org Machine Learning

2512.11829

Country: Europe > Germany > Hesse > Darmstadt Region > Frankfurt (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
(2 more...)

Add feedback

Networked Restless Multi-Arm Bandits with Reinforcement Learning

Zhang, Hanmo, Sun, Zenghui, Wang, Kai

arXiv.org Artificial IntelligenceDec-9-2025

Restless Multi-Armed Bandits (RMABs) are a powerful framework for sequential decision-making, widely applied in resource allocation and intervention optimization challenges in public health. However, traditional RMABs assume independence among arms, limiting their ability to account for interactions between individuals that can be common and significant in a real-world environment. This paper introduces Networked RMAB, a novel framework that integrates the RMAB model with the independent cascade model to capture interactions between arms in networked environments. We define the Bellman equation for networked RMAB and present its computational challenge due to exponentially large action and state spaces. To resolve the computational challenge, we establish the submodularity of Bellman equation and apply the hill-climbing algorithm to achieve a $1-\frac{1}{e}$ approximation guarantee in Bellman updates. Lastly, we prove that the approximate Bellman updates are guaranteed to converge by a modified contraction analysis. We experimentally verify these results by developing an efficient Q-learning algorithm tailored to the networked setting. Experimental results on real-world graph data demonstrate that our Q-learning approach outperforms both $k$-step look-ahead and network-blind approaches, highlighting the importance of capturing and leveraging network effects where they exist.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2512.06274

Country: North America > United States (0.47)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Epidemiology (0.68)
Health & Medicine > Public Health (0.67)
Health & Medicine > Therapeutic Area > Immunology (0.46)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

MALinZero: Efficient Low-Dimensional Search for Mastering Complex Multi-Agent Planning

Tang, Sizhe, Chen, Jiayu, Lan, Tian

arXiv.org Artificial IntelligenceNov-11-2025

Monte Carlo Tree Search (MCTS), which leverages Upper Confidence Bound for Trees (UCTs) to balance exploration and exploitation through randomized sampling, is instrumental to solving complex planning problems. However, for multi-agent planning, MCTS is confronted with a large combinatorial action space that often grows exponentially with the number of agents. As a result, the branching factor of MCTS during tree expansion also increases exponentially, making it very difficult to efficiently explore and exploit during tree search. To this end, we propose MALinZero, a new approach to leverage low-dimensional representational structures on joint-action returns and enable efficient MCTS in complex multi-agent planning. Our solution can be viewed as projecting the joint-action returns into the low-dimensional space representable using a contextual linear bandit problem formulation. We solve the contextual linear bandit problem with convex and $μ$-smooth loss functions -- in order to place more importance on better joint actions and mitigate potential representational limitations -- and derive a linear Upper Confidence Bound applied to trees (LinUCT) to enable novel multi-agent exploration and exploitation in the low-dimensional space. We analyze the regret of MALinZero for low-dimensional reward functions and propose an $(1-\tfrac1e)$-approximation algorithm for the joint action selection by maximizing a sub-modular objective. MALinZero demonstrates state-of-the-art performance on multi-agent benchmarks such as matrix games, SMAC, and SMACv2, outperforming both model-based and model-free multi-agent reinforcement learning baselines with faster learning speed and better performance.

artificial intelligence, machine learning, malinzero, (18 more...)

arXiv.org Artificial Intelligence

2511.06142

Genre: Research Report > Experimental Study (1.00)

Industry:

Leisure & Entertainment > Games (0.92)
Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Deep Active Inference with Diffusion Policy and Multiple Timescale World Model for Real-World Exploration and Navigation

Yokozawa, Riko, Fujii, Kentaro, Nomura, Yuta, Murata, Shingo

arXiv.org Artificial IntelligenceOct-28-2025

Autonomous robotic navigation in real-world environments requires exploration to acquire environmental information as well as goal-directed navigation in order to reach specified targets. Active inference (AIF) based on the free-energy principle provides a unified framework for these behaviors by minimizing the expected free energy (EFE), thereby combining epistemic and extrinsic values. To realize this practically, we propose a deep AIF framework that integrates a diffusion policy as the policy model and a multiple timescale recurrent state-space model (MTRSSM) as the world model. The diffusion policy generates diverse candidate actions while the MTRSSM predicts their long-horizon consequences through latent imagination, enabling action selection that minimizes EFE. Real-world navigation experiments demonstrated that our framework achieved higher success rates and fewer collisions compared with the baselines, particularly in exploration-demanding scenarios. These results highlight how AIF based on EFE minimization can unify exploration and goal-directed navigation in real-world robotic settings.

artificial intelligence, machine learning, navigation, (18 more...)

arXiv.org Artificial Intelligence

2510.23258

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.64)

Add feedback